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Abstract: We consider a simple transformation (coding) of an iid source called 
a bit-shift channel. This simple transformation occurs naturally in magnetic 
or optical data storage. The resulting process is not Markov of any order. We 
discuss methods of computing the entropy of the transformed process, and 
study some of its properties. 



Results presented in this paper originate from the discussions we had at the "Coding 
Club" - the weekly seminar on coding theory at the Philips Research Laboratories 
in Eindhoven. Mike Keane, when his active travelling schedule permits, is also 
attending this seminar. We would like to use this opportunity to thank Mike for 
his active participation, pleasant and fruitful discussions, his inspiration which we 
had a pleasure to share. 



1. Bit-shift channel 

In this paper we consider a simplified model for errors occurring in the readout of 
digital information stored on an optical recording medium like the Compact Disk 
(CD) or the Digital Versatile Disk (DVD). For more detailed information on optical 
storage see or [lfij . 

On optical disks the information is stored in a reflectivity pattern. For technical 
reasons, it is advantageous to use only two states, i.e. "low" and "high" reflectivity. 
Figure [I] shows the disk surfaces for two types of the DVD's. While the presence of 
only 2 states greatly simplifies the detection of the state, it reduces the maximum 
spatial frequency, and hence storage capacity. 

In this situation it is better not to encode the information in the reflectivity state 
itself but rather in the location of the transitions: The reflectivity pattern consists 
of an alternating sequence of "high" and " low" marks of varying length (an integer 
multiple of some small length unit), while each mark exceeds a minimal length, 
say d + 1 units. Hence, this "run-length limited" (RLL) encoding makes sure no 
mark is too short for the disk while the information density is only limited by the 
accuracy of determining the length of the marks, or equivalently the location of the 
transitions. For technical reasons (to recover the length unit from the signal itself) 
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Fig 1. Images of DVD disks. The left image shows a DVD-ROM. The track is formed by pressing 
the disk mechanically onto a master disk. On the right is an image of a rewritable disk. The 
resolution has been increased to demonstrate the irregularities in the track produced by the laser. 
These irregularities lead to higher probabilities of jitter errors. 



another constraint is imposed: No mark must exceed k + 1 units, k > d. (For the 
CD, (d,fc) = (2,10).) 

It is customary to describe RLL sequences by their transitions: A (d, fc)-RLL 
sequence has at least d and at most k 'O's between 'l's. So a "high" mark of 4 
units, followed by a "low" of 3 units, followed by a "high" of 4 units correspond to 
the RLL-sequence 100010010001 written to the disk. 

At the time the RLL-sequence is read from disk, the transitions (the 'l's) might 
be detected at different positions due to noise, inter-symbol interference, clock jitter, 
and other distortions. In the simplest version of this "bit-shift channel model" each 
'1' may be detected one unit early, on time, or one unit late with the probabilities 
(e, 1 — 2e, e), < e < 1/2, and the shifts are independent. 

More formally, suppose X is the length of a continuous interval of low or high 
marks on the disk. Then, after reading, the detected length is 

Y = X + UJleft — bright: (1) 

where w; e /t, bright take values { — 1,0,1}. And to — 1,0,-1 means that the tran- 
sition between the "low" -"high" or "high" -"low" runs was detected one time unit 
too early, correctly, or one unit too late, respectively. Note that for two consecutive 
intervals io r ight of of the first interval is coieft of the second. The simplest model 
for the distribution of time shifts u)i e ft is to assume that they are independent for 
different intervals (runs), and 

¥{oj left = -1) = P(u; left = 1) = £, P(w, e/t = 0) = 1 - 2e, 

for some e £ [0, 1/2]. 

An important question then is: Given (d,k), e, and some distribution for the 
input sequences (e.g. run-lengths uniformly distributed in {d, . . . , /c}), what is the 
mutual information between input and output sequences? In other words, how much 
can be learned about the input from observing the output, on average. The problem 
of computing the mutual information is equivalent to computation of the entropy 
of the output sequence, see 

The supremum of this mutual information over all possible measures on space 
of input sequences is called "channel capacity" . 
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1.1. Model 



Let us describe the bit-shift channel as a continuous transformation (factor) of a 
certain subshift of finite type. Let — {d, . . . , k}, where d, k £ N, d < k and d > 2. 
The input space then is s/ z = {x = (x^ : Xi £ Consider also a finite alphabet 
51 with 9 symbols 

n = {(-i, (-1,0), (-i, i), (o, -i), (o,o), (o, i), (i, -i), (i,o), (i, i)}. 

Finally, consider a subshift of finite type fij C fl z defined as 

Qj = ^(uj n ) £ ^ N '■ = for all n G z|, 

where ui n = (co n ,i,^>n,2)- The factor map <f> is defined on «e/ z x flj as follows: 
y = 4>{x, lo) with 

Vn = x n + Lo n A - u n ,2 for all 71. (2) 

Note that the output space & = ^(s/ 1, x fij) is a subshift of SS 1 , with £3 = 
{d—2,...,k + 2}. Clearly, G ^ . For example, (d - 2, d - 2) cannot occur in 
any output sequence. Indeed, if y n — d — 2, then x n — d, uo nt i = —1 and io n ^ = L 
But then, y n+1 = x n+ i + 0J n +i,i - Un+1,2 = %n+i + ^n,2 - u n +i,2 >d + l-l = d. 
With a similar argument, one concludes that for any L > 1 

[d-2,d,...,d,d-2] or [d - 2, d, . . . , d, d - 1} 

L times L times 

do not occur in any output sequence y £ & . Therefore, there is an infinite number 
of minimal forbidden words, i.e., forbidden words all of whose subwords are allowed. 
Hence, is not a subshift of finite type. However, since stf 1 * x fij is a subshift of 
finite type, & is sofic [l3| . 



1.2. Capacity of a bit-shift channel 

Suppose J is a measure on Slj. For example, in this paper we will be mainly in- 
terested in Markov measures JJ e on fij, obtained in a natural way from Bernoulli 
measures on { — 1,0,1} with probabilities e, 1 — 2e, and e, respectively. If P is a 
translation invariant measure on ^/ z , then we obtain a measure Q on G ', which is 
the push forward of P x J. We use a standard notation Q = (P x J) o 4> ■ 

From the information-theoretical point of view, an important quantity is the 
capacity of the channel. The capacity of a bit-shift channel specified by JJ is defined 
as 

Cbitshift(J) = sup h({¥ x J) o cj)- 1 ) - h(I), (3) 

where the supremum is taken over 3^ > {s/' 1 ') - the set of all translation invariant 
probability measures on and h(-) is the entropy. 

Even for 'Bernoulli' measures JJ £ , the capacity Cbitshift (Je) is not known. It is 
relatively easy to see that the supremum in © is achieved. However, the properties 
of maximizing measures are not known. It is expected that maximizing measures 
are not Markov of any order. Finally, if one is interested in topological entropy of 
G: 

h top {0)= sup h(Q), 

then h top (G) is easily computable using standard methods 0J or using the efficient 
numerical approach of . 
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2. Entropy of a bit-shift channel 

Suppose that {X n } are independent identically distributed random variables taking 
values in stf = {d, . . . ,k}, and let P be the corresponding distribution. What is the 
entropy of Q = (P x J £ ) o 

Note that (X n , ui n ) is a Markov chain, and, hence, Y n , given by Q, is a function 
of a Markov chain. 

Let us start by recalling some methods of computing the entropy of processes 
which are functions of Markov chains. Suppose X — {X n y, n £ Z, is a stationary 
ergodic Markov chain taking values in a finite alphabet si ' . Let <f> : <sz/ — * 38 be 
some map, and consider a process W = {1^}, defined by 

Y n = <p(X n ) for all neZ. 

The following result see also 0, Theorem 4.4.1], provides sharp estimates on 

the entropy of *3f . 

Theorem 2.1. If 3£ is a Markov chain and <3f = then for every n > 1 one 

has 

H(Y \Y 1 ,...,Y n _ 1 ,X n ) < h{W) < H(Y \Y 1 ,... 1 Y n ^ 1 ,Y n ). 
Moreover, as n /* oo 

H(F |ii, . . . , F„_i,X n ) / h(W\ H<Y Q \Y U Y n -i,Y n ) \ h{&). 

Birch 0, has shown that under some additional conditions, the convergence is 
in fact exponential: 

\h{W) - H{Y \Y X , y n _i,x„)| < Cp n , 
\h(&) - H(Yo\Yi, • ■ • , Y n ^,Y n )\ < C P n , 

where p £ (0, 1) is independent of the factor map <f>. 

Let us give a proof of Theorem 12.11 since it is very short and provides us with 
some useful intuition. 

Proof of Theorem \2.1\ An upper estimate of h(^) in terms of H(Yq\Yi, . . . ,Y n ) 
and the monotonic convergence H(Yq\Yi, . . . ,Y n ) to h{?¥) are standard facts. For 
the lower estimate we proceed as follows: for any m £ N one has 

H(Y Q \Y 1 ,...,Y r ^ 1 ,X n ) 

= H(Yo\Yi, . . . , Y n ^i, X n , . . . , X n+m 

) (4) 
) (5) 

< -ff (^o|5^1j ■ ■ * ) ^n-lj ^th • • • j ^n+m)) (6) 

where in (0J we used the Markov property of 3C , and JSJ, iJBJ follow from the 
standard properties of conditional entropies. Since 

h(&) = iim fl-(y |n, ■ ■ ■ , Y n ^,Y n , . . . , y n+m ), 

fc — >oo 

we obtain the lower estimate of h{W). Moreover, using standard properties of con- 
ditional entropies, we immediately conclude that H(Yq\Y\, . . . . Y n _i, X n ) is mono- 
tonically increasing with n. 
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To prove that the lower bound actually converges to h(^), we proceed as follows. 
Note that 

H{X n+x ) > H(X n+1 \Y n ) - H(X n+1 \Y n , ...,Y ) 

n-1 

= ^H{X n+1 \Y n ,...,Y i+l )-H{X n+l \Y n ,...,Yi) 

i=0 
n-1 

= J]] H (X n -i + i\Y n ^i, . . . , Y\) — H(X n -i + i \Y n -i, . . . , Yq) 

i=Q 

n n 

= Y, H(X j+1 \Yj, ...,Y 1 )- H(X ]+l \Y 3l ...,Y )=J2 <H, 

j=l 3 =1 

where 

c 3 = H(X j+1 \Y h . . . , y x ) - H(X j+1 \Yj, . . . , Y ), j = l,...,n. 

Since Cj > and X}j=i c j < ^(-^l) < 00 f° r au n i we conclude that c„ — > as 
n — > oo. Moreover, 

c = ff(x n+ i|y„,...,yi) - J ff(x„ +1 |F„ ! ...,F ) 
= #(y, . . . , y„, x n+l ) - h(y x , Y n ) 

- H{Y , Yi, Y n ,X n+1 ) + H(Y , y, . . . , Y n ) 

= h{y \y x , . . . , y„) - ff(y |y , . . . , y„, x n+l ). 

Finally, since H(Yq\Yi, . . . , Y n ) converges to h(^), so does H(Yq\Y\, . . . , y„, 

□ 

Let us conclude this section with one general remark. Suppose W is a factor of 
<^T, i.e. ^ = </>(i£"), where is some ergodic process. For n,m G N, let 

— -ff(yo|yii • ■ ■ ) Yn, X n+ i, . . . , X n+m ). 

Note that d n , m > dn,m+i > 0, and hence lim m ^oo G?n,m =: D n exists. Since for any 
n, m S N 

d n ,m — H{Yq\Y\, . . . , Y n , X n+ i, . . . , X n+m ) < H (Yo\Yi, . . . , y„, y n +i, . . . , Y n+m ), 

we conclude that D n < h(^V). Note also that since d n , m < d n +i,m-i> one has 
D n+1 > D n . 

The natural question is under which conditions does D n converge to h{ty) as 
n — > oo. For this we need a certain regularity of the conditional probabilities of the 
^"-process. For example, if conditional probabilities are continuous, i.e., if 

r n = sup sup |P(X |Xi, . . . , X n , X' n+1 ,X' n+2 , . . .) 

X ,...,X„ X',X" 

-¥(X \X 1 ,...,X n ,X^ 1 ,X^ +2 ,...)\^0, n^oo, 

then D n — > hi^V). Gibbs measures and ^-measures (see Section0J have continuous 
conditional probabilities. 
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2.1. Entropy via a prefix code 

In this section we recall the approach to efficient computation of entropies of factor 
processes W — 4>(^) 1 where 3£ is Markov, which was originally proposed in [2, L3l • 
The inequalities of Theorem 12 . II can be rewritten as follows 

£ F(y?)H rm) (Y \X n+1 )<h(&)< ]T P( ! /?)ifp ( .| 1( » ) (yo), (7) 

where we use the following notation 

Vi = (2/1, 2/2,-.., y n ) 6 ST, 
F(y?) = F(Y 1 =y 1 ,...,Y n =y n ), 

n-\yx)=n-\Yx=yi,...,Y n =y n ). 

The subindex P(-|2/™) in JZJ stresses that the entropy of Yq and the conditional 
entropy of Fo and X n+1 is computed using P(-|y"). 

Note that the sum in Q is taken over elements of a partition of £§ n into cylinders 
of length n: 

®n = {[y?]\y? e & n }, [y?\ = {y£^ z : m = 2/1,. • .,y n = y n }. 

In fact, an estimate similar to Q holds for any partition of into cylindric sets, 
see 0, Theorem 1]. 

Theorem 2.2. Let W be a finite partition of into cylindric sets: 

f 1 M 

W = |[wj], w, = (w^i, . . . , Wijjj ^ 

Then 

Y, Mw) < h{&) < ]T M w )> (8) 

where 

h 1 (w)=F(Y 1 ...Y lw] =w)H(Y \Y 1 ...Y ]v/l =w, X H+1 ), 
h(w)=F(Y 1 ...Y lw] =w)H{Y \Y 1 ...Y ]vfl = w). 

Theorem (|2.2ll leads to the following algorithm. Suppose W is some partition 
into cylinders. We can refine the partition W by removing a certain word w from 
W and adding all words of the form w6, where b £ 3$, i.e., 

W = W \ {w} U {w&| b £ SS}. (9) 

Suppose {#fe}fc>i is a sequence of partitions such that for each k, Wk+i is a re- 
finement of Wk as in 0, and at each step a word w £ Wk is selected such that 

h(w) - /ii(w) = max(ft(u) - &i(u)Y (10) 

The greedy strategy (jlO[) . as well as some other strategies (e.g, uniform, |w| = 
min ue ^y |u|), guarantees the convergence of the upper and lower estimates in (|HJ), 
i.e., 

lim Y (M w ) - M w )) = °- 
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2.2. Entropy via renewal times. 

As before, suppose that {X n } are independent and identically distributed in {d, 
. . . , k} with P(Xi = £) = pa, I — d, . . . , k. Assume also that pd > 0. 

Another method for estimating the entropy is based on the following observation. 
Suppose Y n — d — 2 for some n. This implies that X n = d, uj n = (—1, 1). Since the 
sequence {u>k} forms a Markov chain, (. . . , ui, . . . , bJ n -i) and (w n +ii w n+2 5 ■ • ■), are 
independent given uj n . Therefore, since Y n = d—2 implies uj n = (—1, 1), we conclude 
that (. • • , Wi, . . . , Wn-i) and (oj„ + i, cij Jl+ 2, ■ • ■) are independent given Y n = d — 2. 
Moreover, since X n form an iid sequence, (. . . ,l^ l _2,i / n-i) and (Y n+ i,Y n+ 2, . . .) 
are also independent given Y n = d — 2. 

Consider our subshift ff, and a set C = [rf — 2] = {y € 6 : yo = d — 2}. Let 
5 : 6 — > be a left shift, and consider an induced map Sc on C: 

where Rc(y) = min{fc > 1 : yk — d — 2}. On C, the induced map has a natural 
Bernoulli partition 

{[d-2 > y u ...,y r ,d-2] : Vj € y } + d - 2, j = 1, . . . , r, r e n}. 

Finally, by the Abramov formula 

oo 

MQ) = ~E E Q([d-2, Vl ,...,y r ,d-2])]ogQ[d-2,y 1 ,...,y r ,d-2]) 

r=l yi,...,y r ^d-2 

+ Q([d-2])IogQ([d-2]). (11) 

Computation of entropy of images of Markov measures using the renewal times 
and induced map was used in the past, see e.g. jl5l |. However, in the case of bit-shift 
channel, the method based on is extremely inefficient. 

3. Numerics 

For illustration we present a numerical computation of the entropy using the prefix 
code method described in Section |2~T1 

The algorithm constructs a sequence of refined partitions Wk as described above. 
A particularly useful strategy is given by ifTUI) . This "greedy" heuristics selects the 
cylinder most responsible for the difference in upper and lower bound, in the hope 
that refining this cylinder will tighten the bounds quickly. This strategy is not 
optimal (as can be shown by example) but it has three advantages. Firstly, the 
bounds converge (eventually). Secondly, if in a particular word w G W , the last 
symbol is the "renewal" symbol d—2 (similarly fc+2), this word will never be refined 
again. Thirdly, the next cylinder to expand can be found quickly by representing 
W as a "priority queue" data structure. 

For illustration, we run the algorithm for the model of the jitter channel de- 
scribed in Section lTTTl The parameters are inspired by the Compact Disc: The error- 
correction and modulation system of the CD essentially produces an RLL-sequence 
with parameters (d, k) — (2, 10). We model the run-lengths as independent identi- 
cally distributed random variables with probabilities pe — P2l t ~ 2 , t £ {2, . . . , 10}, 
where 7 = 0.658 and pi is chosen such that ^2pi — 1. This truncated geometric 
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1.4 1 ' ' 

0.001 0.01 0.1 



epsilon 

Fig 2. Mutual information 1(9 ; X) = h(9) - h(3 £ ) as a function of e, for (d, k) = (2, 10) and 
the truncated geometric distribution for X . 

model with 7 = 0.658 is a very good approximation of the (marginal) run-length 
distribution observed on the CD. 

Figure [5] shows the mutual information 

X) = h{W) - h{S e ) = h{<3f ) + 2eloge + (1 - 2e) log(l - 2e) 

as a function of e. The horizontal line represents the rate designed for the last stage 
of the encoding used in the CD (the so-called EFM code). If the jitter is so strong 
that the mutual information drops below this rate, reliable decoding is impossible. 
In practice, similar plots are used to evaluate the performance of particular encoding 
schemes with respect to various distortions introduced by the physical channel. 

Figure [3] compares the greedy and uniform heuristics. The standard estimate 
H(Yq\Yi, . . . , Y n ) in fact corresponds to the uniform refinement. Observe a superior 
rate of convergence for the greedy refinement strategy. 

4. Thermodynamics of jittered measures 

Bernoulli and Markov measures belong to a wider class of the so-called Gibbs 
measures. Bernoulli and Markov measures are also examples of ^-measures. 

In the seminal paper |lfj| M. Keane introduced a class of g-measures. These 
are the measures whose conditional probabilities are given by a continuous and 
strictly positive function g. For subshifts of finite type, the theory of (/-measures 
is extensive. For sofic subshifts, the problem of defining (/-measures is much more 
complicated. For the first results see the paper by W. Krieger in this volume. 

The thermodynamic formalism allows to look at Gibbs measures from two dif- 
ferent sides. First of all, locally, through the conditional probabilities; and secondly, 
globally, through the variational principles. 

Contrary to the class of (/-measures, the class of Gibbs measures for a sofic 
subshift is well defined. The natural question is whether a "jittered" measure Q = 
(P x J e ) o (j)^ 1 is Gibbs. If the measure is Gibbs and the potential is identified, then, 
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Fig 3. Difference between upper and lower bounds on entropy as a function of \W\, the number 
of cylinders, in partitions built by the greedy and uniform refinement strategies. 



using the variational principle, we obtain another method of computing the entropy 
of Q. 

The subshift & C SS 7 " satisfies a specification property (as a factor of a subshift 
of finite type stf lj x flj which has a specification property |(J). Hence the results 
of on existence of Gibbs measures for expansive dynamical systems with the 
specification property are applicable. If Q would be a Gibbs measure for potential 
/ from the Bowen class V(0), then there would exist positive constants c, C such 
that for any n > and every y G 

< Q([yo,---,yn}) r , 19 s 

" exp(ELo fmv)) ~ (» + " ' [ } 

where S : G — * is a left shift and P(f) is the topological pressure of /. As a 
corollary of ifT^l) one easily concludes that 

i i s , Q([yo, ■ ■ ■ ,y n ]) 

logQ{y \yi,...,y n ) ^log^ ^ 

should be bounded, which is not the case, see [lg- Hence, Q is not Gibbs for any 
potential from the large class of potentials V{0). Examples of measures Q such that 
estimates similar to (|12|l hold for some continuous / and subexponential bounds c n 
and C n (lim„ rT 1 logc„ = lim„ rT 1 logC„ = 0) have been considered 0,H3|, and 
were shown to be weakly Gibbs. It is not known whether Q for the bit-shift channel 
is weakly Gibbs for some continuous potential /. 

Nevertheless, the thermodynamic formalism could be useful in estimating the 
capacity of the bit-shift channel. We recall the notion of a compensation function 
and some results summarized in 19] . First of all, we define the topological pressure 
of real- valued continuous functions defined on &/ z x fij and 0. If / 6 C(j2/ Z x fij), 
g G C(&), the topological pressures of / and g are defined as 

P(/K z xfij)= sup(/i(S)+ J /dS), P(. 9 |^)-sup(ft(Q) + y. 9 dQ), 



Entropy of a bit-shift channel 



283 



where the suprema are taken over all translation invariant measures on srf lj x ttj 
and 6 ', respectively. A measure S on jz/ z x ilj is called an equilibrium state for 
/ e C(£/ z x Qj) if 

P(f\£/ Z x Qj) = h(S) + { fdS. (13) 



We define equilibrium states on 6 in a similar way. It is well known that every 
measure is an equilibrium state: for every translation invariant measure S on stf^ x 
fij one can find a continuous function / : x £lj — > M such that (|13|> holds. 
Moreover, for any § = P x JJ e , such an / is of a special form 

S{x,uj) = f(x) + j £ (uj), 

where / : — > R and j e : Clj — ► M. are continuous functions. (In fact, j e can be 
found explicitly.) 

A continuous function F : £/ z x fij — > R is a compensation function if 

P(F + g o 0|j^ z x Qj) =P(g\e) 

for all 3 £ C(0). Compensation functions exist for factor maps defined on shifts of 
finite type llj| ■ 

An important resu lt is the so-called relative variational principle 00, which 
in our notation states that F is a compensation function if and only if for any 
invariant measure Q on G one has 

h(Q) = sup(h(S) +J FdS So^T 1 = (Q)). 
Suppose F is a compensation function, then for Q = (P x J e ) o 0" 1 we obtain 

> h(P xJ E )+ / Fd{P x J e ) = h(F) + h(3 e ) + ( F E dP, (14) 



where F £ (x) = J Q F(x, oj)3 £ (duj). For the capacity of the bit-shift channel we 
obtain the following lower estimate 

Cbitshift(Je) = SUp h{Q) - h(I e ) 

Q=(PxJ e )o0- 1 

> sup(/i(P) + J F e eff) = P{F e \s^ z ). (15) 

An interesting question is whether the inequalities in 114|) and (|15l) are strict. The 
inequality l|14|l is most probably strict in the generic situation. Indeed, by Corollary 
3.4 yjj, if Q is an equilibrium state for g on , and § is such that S o = Q and 

h(Q) = h(S) + J FdS, 

then § is an equilibrium state for F + g o <j), and conversely. On the other hand if 
§ = P x J e , then § is an equilibrium state for f(x,u>) = f(x) + j s (uj). Therefore, 
for the equality in l|14fl . it is necessary that F(x, to) + (go </>)(x, ui) and f(x) + j e (oS) 
are physically equivalent, i.e., have the same set of equilibrium states. In fact, it is 
quite difficult to imagine how for a given compensation function F of the bit-shift 
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channel and a generic g one could find / to satisfy the requirement of physical 
equivalence. On the other hand, it is not very difficult to see that in fact 

CbitshiftGJe) = sup P(F £ K Z ), (16) 
F 

where the supremum is taken over all compensation functions F. Indeed, suppose 
Q* = (P* x J e ) o is a 'maximal' ergodic measure, i.e., 

c bitBhift (i E ) = MQ*)- W- 

Then there exist continuous functions g* G C{&) and /* G C(.s/ 7j ) such that Q* 
and P* are equilibrium states for g* and /*, respectively. But then 

F(x,w) = r (x) + j e («) - {g* o4>){x,u) 

is the compensation function for which the maximum in Ijl6|) is attained. Thus 
methods for dealing with factor systems developed in dynamical systems, could be 
applied to estimate channel capacities. The practicality of such estimates depends 
strongly on whether one is able to understand the structure of a class of compen- 
sation function for a given channel. Probably, in many concrete cases, a relatively 
large family of compensation functions will suffice as well. 
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